11 research outputs found
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
We propose a method to control the attributes of Language Models (LMs) for
the text generation task using Causal Average Treatment Effect (ATE) scores and
counterfactual augmentation. We explore this method, in the context of LM
detoxification, and propose the Causally Fair Language (CFL) architecture for
detoxifying pre-trained LMs in a plug-and-play manner. Our architecture is
based on a Structural Causal Model (SCM) that is mathematically transparent and
computationally efficient as compared with many existing detoxification
techniques. We also propose several new metrics that aim to better understand
the behaviour of LMs in the context of toxic text generation. Further, we
achieve state of the art performance for toxic degeneration, which are computed
using \RTP (RTP) benchmark. Our experiments show that CFL achieves such a
detoxification without much impact on the model perplexity. We also show that
CFL mitigates the unintended bias problem through experiments on the BOLD
dataset.Comment: 19 pages, 10 figures. Findings of ACL 202
Co-regularized Alignment for Unsupervised Domain Adaptation
Deep neural networks, trained with large amount of labeled data, can fail to
generalize well when tested with examples from a \emph{target domain} whose
distribution differs from the training data distribution, referred as the
\emph{source domain}. It can be expensive or even infeasible to obtain required
amount of labeled data in all possible domains. Unsupervised domain adaptation
sets out to address this problem, aiming to learn a good predictive model for
the target domain using labeled examples from the source domain but only
unlabeled examples from the target domain. Domain alignment approaches this
problem by matching the source and target feature distributions, and has been
used as a key component in many state-of-the-art domain adaptation methods.
However, matching the marginal feature distributions does not guarantee that
the corresponding class conditional distributions will be aligned across the
two domains. We propose co-regularized domain alignment for unsupervised domain
adaptation, which constructs multiple diverse feature spaces and aligns source
and target distributions in each of them individually, while encouraging that
alignments agree with each other with regard to the class predictions on the
unlabeled target examples. The proposed method is generic and can be used to
improve any domain adaptation method which uses domain alignment. We
instantiate it in the context of a recent state-of-the-art method and observe
that it provides significant performance improvements on several domain
adaptation benchmarks.Comment: NIPS 2018 accepted versio
Causal Graphs Underlying Generative Models: Path to Learning with Limited Data
Training generative models that capture rich semantics of the data and
interpreting the latent representations encoded by such models are very
important problems in unsupervised learning. In this work, we provide a simple
algorithm that relies on perturbation experiments on latent codes of a
pre-trained generative autoencoder to uncover a causal graph that is implied by
the generative model. We leverage pre-trained attribute classifiers and perform
perturbation experiments to check for influence of a given latent variable on a
subset of attributes. Given this, we show that one can fit an effective causal
graph that models a structural equation model between latent codes taken as
exogenous variables and attributes taken as observed variables. One interesting
aspect is that a single latent variable controls multiple overlapping subsets
of attributes unlike conventional approach that tries to impose full
independence. Using a pre-trained RNN-based generative autoencoder trained on a
dataset of peptide sequences, we demonstrate that the learnt causal graph from
our algorithm between various attributes and latent codes can be used to
predict a specific property for sequences which are unseen. We compare
prediction models trained on either all available attributes or only the ones
in the Markov blanket and empirically show that in both the unsupervised and
supervised regimes, typically, using the predictor that relies on Markov
blanket attributes generalizes better for out-of-distribution sequences